Extracting French-Japanese Word Pairs from Bilingual Corpora based on Transliteration Rules
نویسندگان
چکیده
It has been shown so far that using transliteration rules to extract Japanese Katakana and English word pairs is highly useful and promising. But for Japanese-French pairs, the method is not guaranteed to work, because only a very few Japanese Katakana words are borrowed directly from French. In this paper we will show the possibility of extracting Japanese Katakana and French word pairs based on transliteration from loosely aligned Japanese French bilingual corpora. The method applies all the existing transliteration rules to each mora unit in a Katakana word, and extracts the French word which matches or partially-matches one of these transliteration candidates as translation. For instance, if we have `グラフ' in the Japanese part of a bilingual corpora, we generate such transliteration candidates as , , ,... and identify similar words from French part of the corpora. The method performed reasonably well, achieving 80% precision at 20% recall. We had also observed that Japanese-English transliteration rules worked well for extracting Katakana-French word pairs.
منابع مشابه
Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs
The method to automatically extract translational Japanese-KATAKANA and English word pairs from bilingual corpora is proposed. The method applies all the existing transliteration rules to each mora unit in a KATAKANA word, and extract English word which matched or partially-matched to one of these transliteration candidates as translation. For instance, if there is a word ‘グラフ’ (graph) in Japan...
متن کاملExtracting Transliteration Pairs from Comparable Corpora
Transliterating words and names from one language to another is a frequent and highly productive phenomenon. For example, English word cache is transliterated in Japanese asキャッシュ “kyasshu”. In many cases, recent transliterations are not recorded in machine readable dictionaries so it is impossible to rely on dictionary lookup to find transliteration equivalents. In this paper we describe a meth...
متن کاملBilingual Dictionary Construction with Transliteration Filtering
In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine tra...
متن کاملAutomatic extraction of bilingual word pairs using inductive chain learning in various languages
In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficien...
متن کاملEnglish-Chinese Transliteration Word Pair Extraction from Parallel Corpora
Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002